Applying Maximum Entropy to Robust Chinese Shallow Parsing

نویسندگان

  • Shih-Hung Wu
  • Cheng-Wei Shih
  • Chia-Wei Wu
  • Richard Tzong-Han Tsai
  • Wen-Lian Hsu
چکیده

Recently, shallow parsing has been applied to various information processing systems, such as information retrieval, information extraction, question answering, and automatic document summarization. A shallow parser is suitable for online applications, because it is much more efficient and less demanding than a full parser. In this research, we formulate shallow parsing as a sequential tagging problem and use a supervised machine learning technique, Maximum Entropy (ME), to build a Chinese shallow parser. The major features of the ME-based shallow parser are POSs and the context words in a sentence. We adopt the shallow parsing results of Sinica Treebank as our standard, and select 30,000 and 10,000 sentences from Sinica Treebank as the training set and test set respectively. We then test the robustness of the shallow parser with noisy data. The experiment results show that the proposed shallow parser is quite robust for sentences with unknown proper nouns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Chunking Based on Maximum Entropy Markov Models

This paper presents a new Chinese chunking method based on maximum entropy Markov models. We firstly present two types of Chinese chunking specifications and data sets, based on which the chunking models are applied. Then we describe the hidden Markov chunking model and maximum entropy chunking model. Based on our analysis of the two models, we propose a maximum entropy Markov chunking model th...

متن کامل

SRCB-WSD: Supervised Chinese Word Sense Disambiguation with Key Features

This article describes the implementation of Word Sense Disambiguation system that participated in the SemEval-2007 multilingual Chinese-English lexical sample task. We adopted a supervised learning approach with Maximum Entropy classifier. The features used were neighboring words and their part-of-speech, as well as single words in the context, and other syntactic features based on shallow par...

متن کامل

Applying Conditional Random Fields to Chinese Shallow Parsing

Chinese shallow parsing is a difficult, important and widely-studied sequence modeling problem. CRFs are new discriminative sequential models which may incorporate many rich features. This paper shows how conditional random fields (CRFs) can be efficiently applied to Chinese shallow parsing. We employ using CRFs and HMMs on a same data set. Our results confirm that CRFs improve the performance ...

متن کامل

A Shallow Discourse Parsing System Based On Maximum Entropy Model

This paper describes our system for Shallow Discourse Parsing the CoNLL 2015 Shared Task. We regard this as a classification task and build a cascaded system based on Maximum Entropy to identify the discourse connective, the spans of two arguments and the sense of the discourse connective. We trained the cascaded models with a variety of features such as lexical and syntactic features. We also ...

متن کامل

Shallow Parsing with Conditional Random Fields

Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random fiel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005